{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "whjsJasuhstV"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_01_2_genai.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "euOZxlIMhstX"
   },
   "source": [
    "# T81-559: Applications of Generative Artificial Intelligence\n",
    "**Module 1: Course Overview**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d4Yov72PhstY"
   },
   "source": [
    "# Module 1 Material\n",
    "\n",
    "* Part 1.1: Course Overview [[Video]](https://www.youtube.com/watch?v=OVS-6s20Ms0) [[Notebook]](t81_559_class_01_1_overview.ipynb)\n",
    "* **Part 1.2: Generative AI Overview** [[Video]](https://www.youtube.com/watch?v=ohmPaSsKhMs) [[Notebook]](t81_559_class_01_2_genai.ipynb)\n",
    "* Part 1.3: Introduction to OpenAI [[Video]](https://www.youtube.com/watch?v=C2xyi2Cq-bU) [[Notebook]](t81_559_class_01_3_openai.ipynb)\n",
    "* Part 1.4: Introduction to LangChain [[Video]](https://www.youtube.com/watch?v=qQI5AhaKxuI) [[Notebook]](t81_559_class_01_4_langchain.ipynb)\n",
    "* Part 1.5: Prompt Engineering [[Video]](https://www.youtube.com/watch?v=_Uot1i5sIXo) [[Notebook]](t81_559_class_01_5_prompt_engineering.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AcAUP0c3hstY"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running and maps Google Drive if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xsI496h5hstZ",
    "outputId": "cbc72cd3-dfc0-4ba1-b88d-e654b514bba6"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: not using Google CoLab\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    from google.colab import drive\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pC9A-LaYhsta"
   },
   "source": [
    "# Part 1.2: Generative AI Overview\n",
    "\n",
    "Generative AI refers to a subset of artificial intelligence technologies that can generate new content, ranging from text, images, and videos to music and code, that did not exist before. Unlike traditional AI systems, primarily designed to analyze data and make predictions or decisions based on that data, generative AI focuses on the creative aspect, leveraging patterns and information from vast datasets to produce original outputs.\n",
    "Generative AI is composed of algorithms known as generative models. These models are trained on extensive data collections in a specific domain, learning the underlying structure and distribution of that data. Once trained, they can generate new instances similar to, but not identical to, the data they were trained on, effectively mimicking the process of human creativity but at a scale and speed unattainable by humans alone.\n",
    "\n",
    "In the landscape of modern generative AI,  text-to-image and text-to-text models have reshaped the possibilities in creative and analytical domains. At the forefront of these innovations are models like DALL·E and GPT (from OpenAI), which exemplify the remarkable capabilities of generative AI in producing detailed images from textual descriptions and generating human-like text, respectively.\n",
    "\n",
    "Text-to-image models such as DALL·E have revolutionized how we think about visual content creation, allowing for the generation of highly detailed and contextually relevant images from simple text prompts. This technological leap enables a synergy between creativity and AI, opening new avenues for artists, designers, and content creators to explore complex concepts and visualizations that were previously beyond reach.\n",
    "Simultaneously, text-to-text models like GPT have evolved to exhibit profound understanding and generation capabilities, engaging in coherent and contextually rich conversations, authoring essays, and even writing code. \n",
    "\n",
    "These models leverage vast amounts of data and sophisticated neural network architectures to understand nuances in language, making them invaluable tools for a wide range of applications, including content creation, customer service automation, and educational tools.\n",
    "Together, these advancements represent a significant shift in generative AI, moving beyond the competitive dynamics of GANs and the encoding-decoding focus of VAEs. \n",
    "\n",
    "Generative AI has found applications in numerous fields, revolutionizing how content creation. It generates art, music, and literary works in the creative industries, offering artists and creators new tools to express their creativity. In technology and software development, it can generate code and automate programming tasks. In the business world, it creates marketing copy, reports, and even synthetic data for training other AI models. The flexibility and power of generative AI make it a fascinating area of research and development, promising to continue transforming industries and expanding the boundaries of what machines can create.\n",
    "\n",
    "## History of Generative AI\n",
    "\n",
    "The journey of generative AI is a fascinating evolution that mirrors the broader trajectory of advancements in artificial intelligence and deep learning. From its early inception, generative AI has sought to mimic and even enhance the creative capacities of the human mind. Developers have progressively realized this goal by developing increasingly sophisticated models and techniques.\n",
    "\n",
    "The story begins with the advent of deep learning, a subset of machine learning where artificial neural networks—inspired by the structure and function of the human brain—learn from large amounts of data. Deep learning marked a significant departure from traditional machine learning methods, enabling computers to recognize patterns and make decisions with minimal human intervention. This breakthrough laid the foundational stones for the development of generative AI by providing the tools necessary to process and generate complex data forms, such as images, text, and sound.\n",
    "\n",
    "Generative Adversarial Networks (GANs), introduced by Ian Goodfellow and his colleagues in 2014, represented a significant milestone in the field. GANs employ two neural networks in a competitive game, where one network generates data (the generator), and the other evaluates it (the discriminator), iteratively improving the quality of the generated outputs. This approach proved highly effective for generating realistic images, videos, and even music, showcasing the potential of generative AI to create content indistinguishable from that produced by humans.\n",
    "Parallel to the development of GANs, Variational Autoencoders (VAEs) offered another pathway by focusing on encoding input data into a lower-dimensional space and then decoding it back to generate new data. While VAEs and GANs advanced the field, they were primarily focused on images and similar data types, leaving a gap in text generation.\n",
    "\n",
    "The introduction of advanced text-to-text models, such as OpenAI's GPT series, marked the next leap forward. Based on the Transformer architecture introduced in 2017, these models demonstrated an unprecedented ability to generate coherent and contextually rich text. By training on diverse and extensive internet text, these models could perform various language tasks, from translation to question-answering and essay writing, with surprising fluency and creativity.\n",
    "\n",
    "Following the success in text generation, the AI field witnessed the emergence of text-to-image models, such as DALL·E, which combined the descriptive power of natural language with the generative capabilities of neural networks to create detailed and contextually relevant images from textual descriptions. This evolution illustrated the maturation of generative AI from its deep learning roots to its current status, where it can mimic and augment human creativity across various mediums.\n",
    "\n",
    "Today, generative AI is a testament to the remarkable progress in artificial intelligence, showcasing the blend of technical sophistication and creative potential. From deep learning's pattern recognition to the intricate dance of GANs, the narrative eloquence of GPT, and the imaginative visualizations of DALL·E, the evolution of generative AI reflects the ongoing journey toward creating machines that can think, develop, and innovate alongside humans.\n",
    "\n",
    "## Text to Text Models\n",
    "\n",
    "\n",
    "Text-to-text language models represent a transformative advancement in the field of generative artificial intelligence, offering a broad spectrum of applications that span from automated writing assistance to sophisticated conversational agents. At their core, these models process input text and produce output text corresponding to the task at hand, whether translating between languages, summarizing lengthy documents, or generating creative content. Built upon deep learning algorithms, these models are trained on vast corpora of text data, enabling them to grasp and replicate the nuances of human language. Their ability to understand context and generate coherent, contextually relevant text makes them versatile tools in both professional and creative domains.\n",
    "\n",
    "[ChatGPT](https://chat.openai.com/), developed by [OpenAI](https://openai.com/), is a prime example of a text-to-text language model that has captured the public's imagination and can generate responses to various input prompts. [[Cite:mann2020language]](https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html) Whether users seek detailed explanations, creative storytelling, or technical assistance, ChatGPT leverages extensive training on diverse internet text to deliver responses that often mimic human-like understanding and expression. This conversational model demonstrates the practical utility of generative AI in everyday interactions and highlights the potential for these technologies to augment human capabilities, foster creativity, and facilitate access to information across language barriers. ChatGPT exemplifies the cutting-edge of text-to-text language models through its sophisticated engagement with users, offering a glimpse into the future of human-computer interaction. 1.TXT2TXT shows ChatGPT at work.\n",
    "\n",
    "**Figure 1.TXT2TXT: Text to Text Model**\n",
    "![Neural Network Luminaries](https://data.heatonresearch.com/images/wustl/app_genai/model_text2text.jpg \"Text 2 Text Model\")\n",
    "\n",
    "Text-to-text models, which have revolutionized the field of natural language processing (NLP), owe much of their success to a groundbreaking neural network architecture known as the Transformer. Introduced in the seminal paper \"Attention Is All You Need\" by Vaswani et al. in 2017, the Transformer model eschewed previous reliance on recurrent neural networks (RNNs) and convolutional neural networks (CNNs) for processing sequential data.[[Cite:NIPS2017_3f5ee243]](https://arxiv.org/abs/2112.10752) Instead, it introduced an architecture built entirely around attention mechanisms, which enabled the model to weigh the significance of different parts of the input data differently. This approach significantly improved the model's ability to handle long-range dependencies in text, a challenge that had hampered earlier models.\n",
    "\n",
    "Embeddings and attention mechanisms are at the heart of Transformer-based models, including text-to-text frameworks. Embeddings are high-dimensional vectors representing words or tokens in a continuous vector space. These embeddings capture semantic meanings and relationships between words, allowing the model to process and understand input text. The attention mechanism, particularly the self-attention variant used in Transformers, allows the model to dynamically focus on different parts of the input sequence when producing an output, effectively determining which parts of the input are most relevant for a given task. This development was crucial for understanding the context and nuances of language, as it enables the model to make connections between distant words in a sentence or document.\n",
    "\n",
    "The \"Attention Is All You Need\" paper not only introduced the Transformer architecture but also laid the foundation for subsequent developments in NLP. Thanks to their scalability and efficiency, transformer models have been the basis for many advanced text-to-text models, including BERT, GPT series, and T5, among others. These models are trained on vast datasets and can perform various language tasks, from translation and summarization to question-answering and text generation, without requiring task-specific architectures. The Transformer's ability to handle sequential data in parallel, scalability, and effective handling of long-range dependencies have made it a cornerstone of modern NLP research and applications. Figure 1.TRANSFORMER shows the primary components of a transformer model.\n",
    "\n",
    "**Figure 1.TRANSFORMER: Transformer Model**\n",
    "\n",
    "![Transformer Model](https://data.heatonresearch.com/images/wustl/app_genai/model_transformer.jpg \"Text 2 Text Model\")\n",
    "\n",
    "## Text to Image Models\n",
    "\n",
    "[Text-to-image](https://en.wikipedia.org/wiki/Text-to-image_model) language models mark a revolutionary leap in the domain of generative artificial intelligence, blurring the lines between the creativity traditionally attributed to humans and the capabilities of AI. These models interpret textual descriptions and generate corresponding images with astonishing accuracy and detail, showcasing an understanding of content, context, and artistic nuances. By analyzing vast datasets comprising images and their descriptions, these models learn to recognize and replicate complex patterns, textures, and styles. This technology not only opens new avenues for creative expression but also serves practical applications in design, education, and entertainment, transforming mere words into visual narratives.\n",
    "\n",
    "[DALL-E](https://openai.com/dall-e-2), a model developed by OpenAI, stands as a pioneering example of text-to-image language models, showcasing the seamless integration of generative AI's potential with visual creativity. [[Cite:reddy2021dall]](https://www.journal-dogorangsang.in/no_1_NECG_21/14.pdf) Unlike its predecessors focused solely on text, Dall-E interprets input prompts to produce unique, relevant images, bridging the gap between textual and visual imagination. This capability exemplifies the extension of generative AI beyond the realm of text, venturing into the visual domain where the implications are as profound as they are captivating. Through the lens of Dall-E, the potential of generative AI to innovate across diverse fields is vividly illustrated, offering a glimpse into a future where AI's creative collaboration with humans is boundless. Figure 1.TXT2IMG shows Dall-E utilized through ChatGPT.\n",
    "\n",
    "**Figure 1.TXT2IMG: Text to Image Model**\n",
    "![Neural Network Luminaries](https://data.heatonresearch.com/images/wustl/app_genai/model_text2img.jpg \"Text to Image Model\")\n",
    "\n",
    "\n",
    "Stable Diffusion [[Cite:rombach2022high]](https://arxiv.org/abs/2112.10752) is a significant advancement in generative AI, specifically in text-to-image models. Developed to transform textual descriptions into detailed, high-quality images, Stable Diffusion utilizes deep learning techniques to understand and creatively interpret text prompts, generating visual content that closely aligns with the input descriptions. This model has garnered widespread attention for its ability to produce diverse and complex images across various styles and subjects, democratizing the creation of digital art and enabling a broad spectrum of applications from conceptual design to entertainment.\n",
    "\n",
    "Stable Diffusion and DALL-E share a foundational technology in their architecture known as the [autoencoder](https://en.wikipedia.org/wiki/Autoencoder). An autoencoder is a neural network that learns efficient representations (encodings) of unlabeled data, typically for dimensionality reduction or feature learning. The architecture of an autoencoder consists of two main parts: the encoder, which compresses the input data into a compact representation, and the decoder, which reconstructs the input data from this compact representation as closely as possible to the original input. In text-to-image models like Stable Diffusion and DALL-E, the algorithms utilize autoencoders to translate between modalities—converting text descriptions into compact, latent representations, which the model decodes into detailed images. This process enables the models to capture the essence of textual prompts and creatively express them as visual artworks, showcasing autoencoders' remarkable flexibility, and power in bridging the gap between text and image generation in generative AI.\n",
    "\n",
    "Figure Figure 1.DIFFUSION shows the structure of the Stable Diffusion model, you can see the distinctive hourglass shape of the autoencoder. \n",
    "\n",
    "**Figure 1.DIFFUSION: Text to Image Model**\n",
    "![Stable Diffusion Model](https://data.heatonresearch.com/images/wustl/app_genai/model_diffusion.jpg \"Stable Diffusion Model\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "anaconda-cloud": {},
  "colab": {
   "collapsed_sections": [],
   "name": "t81_558_class_01_1_overview.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.11 (genai)",
   "language": "python",
   "name": "genai"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
